MCMC

Jesse Brunner

Motivation for MCMC

Many models are too complex for quadratic approximation or other short cuts.

We need a general method of numeric integration to explore the posterior effectivly.

MCMC \(\lim_{t\to\infty}\) posterior

What’s in a name? Markov chain Monte Carlo

Monte Carlo: “repeated random sampling to obtain numerical results” -wikipedia

Markov chain: stochastic model in which state at time \(t\) depends only on previous state at \(t-1\)

Imagine a posterior

Imagine this one-dimensional posterior

  • The x-axis is our parameter, \(\theta\).
  • The y-axis is the posterior probability.

Imagine a posterior

Imagine this one-dimensional posterior:

  • The x-axis is our parameter, \(\theta\).
  • The y-axis is the posterior probability.
    • actually, is proportional to the posterior
    • we don’t know the probability of the data, \(\text{Pr}(x)\).

Step 1

Imagine we take a guess for \(\theta\) This is the first step of our (Markov) chain

Step 1

We can then find the height for this guess

height = likelihood of data given \(\theta\) \(\times\) prior of \(\theta\)
= \(\text{Pr}(x|\theta) \times \text{Pr}(\theta)\).

Step 2

Choose a proposal for a new value on the chain. - Our proposal distribution is centered on our last guess. - Can be any symmetric shape.

Step 2

Step 2

Find height (posterior-ish value) for our proposal

Step 3

Calculate the ratio of those heights: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} = \frac{0.078}{0.07} = 1.124 \]

Why ratios are clever

We want to calculate the ratio of posterior probabilities:

\[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x)}}{\frac{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}{\text{Pr}(x)}} \] but we do not know (and cannot calculate) \(\text{Pr}(x)\).

However, they cancel out! We just need the numerators! \[ r = \frac{\text{Pr}(\theta_p | x)}{\text{Pr}(\theta_{t-1} | x)} = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta)\text{Pr}(\theta)} \] ## Step 3 {.smaller}

Choose a random uniform variable, \(U \sim \text{Uniform}(0,1)\)

If \(r > U\), accept the proposal & add it to the chain

Repeat steps 2 & 3

Repeat steps 2 & 3

Repeat steps 2 & 3

\[ r = \frac{0.061}{0.078} = 0.781 ; U = 0.89 \]

Since \(r < U\), reject proposal & keep the current position.

One more time

One more time

\[ r = \frac{0.079}{0.078} = 1.004 \] \(U=0.212\)

Accept proposal & add to chain

Chain so far

Chain after a while

Turn it on it’s side

It works!

This is the Metropolis Algorithm

Did you catch the problem?

What if we get proposals outside of possible boundaries?

  • lots of rejections
  • inefficient sampling

Solution: asymmetric proposal distribution

However, proposal distribution is \(a\)symmetric:

\[ \text{Pr}(\theta_{t-1} \rightarrow \theta_p) \neq \text{Pr}(\theta_{p} \rightarrow \theta_{t-1}) \]

Metropolis-Hastings algorithm

Adjust the ratio to: \[ r = \frac{\text{Pr}(x|\theta_p)\text{Pr}(\theta_p)}{\text{Pr}(x|\theta_{t-1})\text{Pr}(\theta_{t-1})}\times \frac{\text{Pr}(\theta_{t-1}|\theta_p)}{\text{Pr}(\theta_p|\theta_{t-1})} \]

and it works!

But… higher dimensional problems

Imagine a proposed jump from one of these points could easily be far from posterior density

  • Gets worse with higher N
  • Many to most proposals are rejected = inefficient

Solution: HMC

Hamiltonian Monte Carlo

  • pretends the (negative-log) posterior is a surface
  • simulates a ball rolling along that
  • produces proposals informed by topography
  • otherwise like Metropolis-Hastings

HMC

First, take the negative log, so high probability = low areas

HMC

First, take the negative log, so high probability = low areas

HMC

Take the first guess

HMC

…and give it a bit of momentum (in a direction of parameter space)

HMC

Track it’s movement for a certain amount of “time” using Hamiltonian equations

\[ \begin{align} \text{Total Energy} &= \text{Potential} + \text{Kinetic}\\ E &= U(\theta) + K(\text{momentum}) \end{align} \]

After a bit of time, point at end is the new proposal

  • Proposals tend to be in areas with higher probability density

HMC

HMC

HMC details

No analytic solution to Hamiltonian for most problems, so:

  • solve it numerically over a series of steps (Leapfrog algorithm)

HMC details

No analytic solution to Hamiltonian for most problems, so:

  • solve it numerically over a series of steps (Leapfrog algorithm)
  • Stan will tune step size and number of leapfrog steps
    • too many steps and we end up at the bottom, every time
    • too coarse and we don’t follow the posterior
  • Energy at start and end should be the same if we followed posterior surface
    • gives a warning if these diverge

Proposal acceptance actually reflects the difference in energy (i.e., how well we followed the path of the ball)

So unless there were divergences, we should accept

HMC details

Overall, each proposal requires many more calculations (many steps, calculating kinetic energy & momentum at each), but proposals are much better / rarely rejected, so much more efficient overall